GitHub Repo: https://github.com/kennethkn/congresstweets-analysis

Introduction

This project aims to foster a boarder understanding of the bipartisan U.S. politics by analyzing the tweets of U.S. Congress members, including Democratic and Republican senators and representatives. The importance of it lies in the potential to reveal patterns and trends in the political discourse of recent years. Understanding these patterns can provide insights into the priorities and strategies of the two parties. Additionally, analyzing the sentiment of tweets can reveal their stance on current issues.

Data

The data used in this analysis is a collection of tweets from U.S. Congress members, available on GitHub at https://github.com/alexlitel/congresstweets. Given the enormous size of the dataset (~4M entries), I have chosen a database approach to store and query the data. The database is hosted locally on my computer via PostgresSQL, but you can reproduce the database by executing Python scripts in the scripts folder, which holds scripts for database construction as well as text mining. For more information on how to use them, please refer to README. This R Markdown document will query the database to perform the analysis and generate the plots.

Setup

First, we need to load the necessary libraries and connect to the database.

library(DBI)
library(RPostgres)
library(ggplot2)
library(tidytext)
library(ggpattern)
readRenviron(".env")
DATABASE_URL <- Sys.getenv("DATABASE_URL")

url_parts <- unlist(strsplit(DATABASE_URL, "://|@|:|/"))

driver <- url_parts[1]
user <- url_parts[2]
host <- url_parts[3]
port <- url_parts[4]
dbname <- url_parts[5]

con <- dbConnect(RPostgres::Postgres(), dbname = dbname, host = host, port = port, user = user)
D_HEX <- "#00AEF3"
R_HEX <- "#E81B23"
NA_HEX <- "grey"

HOUSE_HEX <- "#F8766D"
SENATE_HEX <- "#00BFC4"

Data Exploration

Let’s start by exploring the structure of the database.

# List tables
dbListTables(con)
## [1] "members"         "accounts"        "tweets"          "alembic_version"
# Row count and column names for "tweets" table
dbGetQuery(con, "SELECT COUNT(*) FROM tweets")
##     count
## 1 4310843
dbGetQuery(con, "SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'tweets'")
##        column_name                   data_type
## 1  sentiment_score            double precision
## 2             time timestamp without time zone
## 3             link           character varying
## 4             text           character varying
## 5           source           character varying
## 6         hashtags                       ARRAY
## 7         mentions                       ARRAY
## 8       retweeting           character varying
## 9     cleaned_text           character varying
## 10          tokens                       ARRAY
## 11              id           character varying
## 12         quoting           character varying
## 13      account_id           character varying
# Row count and column names for "members" table
dbGetQuery(con, "SELECT COUNT(*) FROM members")
##   count
## 1   730
dbGetQuery(con, "SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'members'")
##   column_name         data_type
## 1          id           integer
## 2        name character varying
## 3     chamber character varying
## 4       party character varying
# Row count and column names for "accounts" table
dbGetQuery(con, "SELECT COUNT(*) FROM accounts")
##   count
## 1  1487
dbGetQuery(con, "SELECT column_name, data_type FROM information_schema.columns WHERE table_name = 'accounts'")
##    column_name         data_type
## 1    member_id           integer
## 2           id character varying
## 3       handle character varying
## 4 account_type character varying
## 5 prev_handles             ARRAY

Now, let’s take a look at the first few rows of each table.

dbGetQuery(con, "SELECT * FROM tweets LIMIT 6")
##                    id account_id                time
## 1 1004924628584652801   41363507 2018-06-07 23:14:36
## 2 1005541414275506176   47747074 2018-06-09 16:05:29
## 3 1006194503823384576  248699486 2018-06-11 11:20:38
## 4 1006601144259317761  381152398 2018-06-12 14:16:29
## 5 1006541101862449152  248699486 2018-06-12 10:17:53
## 6 1007004577948766208   18137749 2018-06-13 16:59:35
##                                                                  link
## 1     https://www.twitter.com/DeanHeller/statuses/1004924628584652801
## 2    https://www.twitter.com/brianschatz/statuses/1005541414275506176
## 3  https://www.twitter.com/RepJeffDenham/statuses/1006194503823384576
## 4 https://www.twitter.com/RepTerriSewell/statuses/1006601144259317761
## 5  https://www.twitter.com/RepJeffDenham/statuses/1006541101862449152
## 6 https://www.twitter.com/ChrisVanHollen/statuses/1007004577948766208
##                                                                                                         text
## 1                                                                                                 #vegasborn
## 2                                                           ! http://pbs.twimg.com/media/DfRmllaVMAAuKPZ.jpg
## 3                                                             http://pbs.twimg.com/media/Dfa4jtkWsAAvTmt.png
## 4                                                             http://pbs.twimg.com/media/DfgqYm_WkAUBzip.jpg
## 5                                                             http://pbs.twimg.com/media/DffztXzW0AIZ5em.png
## 6 https://www.pscp.tv/w/bfF9NzF4a2pEWURHVmFlRXp8MWxQS3FrcmFNcEFKYpSRL-75_Xnp5Vp6sLudNJ1rwlAakc3XZdH1N_y9v_Es
##               source     hashtags mentions retweeting sentiment_score
## 1 Twitter for iPhone {#vegasborn}       {}       <NA>               0
## 2 Twitter for iPhone           {}       {}       <NA>               0
## 3          TweetDeck           {}       {}       <NA>               0
## 4 Twitter Web Client           {}       {}       <NA>               0
## 5          TweetDeck           {}       {}       <NA>               0
## 6          Periscope           {}       {}       <NA>               0
##   cleaned_text      tokens quoting
## 1   #vegasborn {vegasborn}    <NA>
## 2            !          {}    <NA>
## 3                       {}    <NA>
## 4                       {}    <NA>
## 5                       {}    <NA>
## 6                       {}    <NA>
dbGetQuery(con, "SELECT * FROM members LIMIT 6")
##    id          name chamber party
## 1 188     Don Young   house     R
## 2 189 Bradley Byrne   house     R
## 3 190   Felix Moore   house     R
## 4 191   Gary Palmer   house     R
## 5 192    Jerry Carl   house     R
## 6 193   Martha Roby   house     R
dbGetQuery(con, "SELECT * FROM accounts LIMIT 6")
##                    id        handle account_type prev_handles member_id
## 1            37007274   repdonyoung       office         <NA>       188
## 2          2559398984    DonYoungAK     campaign         <NA>       188
## 3            42481696  BradleyByrne     campaign         <NA>       189
## 4          2253968388      RepByrne       office         <NA>       189
## 5            74054398    BarryMoore     campaign         <NA>       190
## 6 1339006078688825344 RepBarryMoore       office         <NA>       190

We can join the tables to get more information about the authors of the tweets.

dbGetQuery(con, "SELECT * FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id LIMIT 6")
##                    id account_id                time
## 1 1007004577948766208   18137749 2018-06-13 16:59:35
## 2 1007335177838845953   19407835 2018-06-14 14:53:16
## 3 1008024813384949765   58579942 2018-06-16 12:33:38
## 4 1009479861910605824  234014087 2018-06-20 12:55:28
## 5 1009508199190257664  234014087 2018-06-20 14:48:05
## 6 1009873147817013253  235271965 2018-06-21 14:58:15
##                                                                  link
## 1 https://www.twitter.com/ChrisVanHollen/statuses/1007004577948766208
## 2 https://www.twitter.com/RepMikeCoffman/statuses/1007335177838845953
## 3      https://www.twitter.com/mattgaetz/statuses/1008024813384949765
## 4      https://www.twitter.com/RepWilson/statuses/1009479861910605824
## 5      https://www.twitter.com/RepWilson/statuses/1009508199190257664
## 6 https://www.twitter.com/RepStevePearce/statuses/1009873147817013253
##                                                                                                         text
## 1 https://www.pscp.tv/w/bfF9NzF4a2pEWURHVmFlRXp8MWxQS3FrcmFNcEFKYpSRL-75_Xnp5Vp6sLudNJ1rwlAakc3XZdH1N_y9v_Es
## 2                                                             http://pbs.twimg.com/media/DfrFwvqXcAYzxj8.jpg
## 3                                                                                                   @WKRG MC
## 4                                                             http://pbs.twimg.com/media/DgFP93jWsAEMYBJ.jpg
## 5                                                             http://pbs.twimg.com/media/DgFP93jWsAEMYBJ.jpg
## 6                                                             http://pbs.twimg.com/media/DgPKRILWAAE8JlG.jpg
##               source hashtags mentions retweeting sentiment_score cleaned_text
## 1          Periscope       {}       {}       <NA>               0
## 2 Twitter Web Client       {}       {}       <NA>               0
## 3 Twitter for iPhone       {}   {WKRG}       <NA>               0     @WKRG MC
## 4       Media Studio       {}       {}       <NA>               0
## 5       Media Studio       {}       {}       <NA>               0
## 6 Twitter Web Client       {}       {}       <NA>               0
##      tokens quoting    id..14         handle account_type prev_handles
## 1        {}    <NA>  18137749 ChrisVanHollen       office         <NA>
## 2        {}    <NA>  19407835 RepMikeCoffman       office         <NA>
## 3 {wkrg,mc}    <NA>  58579942      mattgaetz     campaign         <NA>
## 4        {}    <NA> 234014087      RepWilson       office         <NA>
## 5        {}    <NA> 234014087      RepWilson       office         <NA>
## 6        {}    <NA> 235271965 RepStevePearce       office         <NA>
##   member_id id..19             name chamber party
## 1       847    847 Chris Van Hollen  senate     D
## 2       287    287     Mike Coffman   house     R
## 3       322    322       Matt Gaetz   house     R
## 4       312    312 Frederica Wilson   house     D
## 5       312    312 Frederica Wilson   house     D
## 6       552    552     Steve Pearce   house     R

What is the time range of the tweets?

dbGetQuery(con, "SELECT MIN(time), MAX(time) FROM tweets")
##                   min                 max
## 1 2017-06-21 00:02:47 2023-07-11 17:00:02

We have tweets from June 2017 to July 2023. Since the data for 2017 and 2023 are incomplete, we will focus on the period from 2018 to 2022 for any time series analysis, which includes the 2nd session of the 115th Congress to the 1st session of the 117th Congress.

For reference: 115th Congress (2017-2019): Republican majority in both chambers 116th Congress (2019-2021): Democratic majority in the House, Republican majority in the Senate 117th Congress (2021-2023): Democratic majority in both chambers

Let’s also find out the possible values for categorical columns.

dbGetQuery(con, "SELECT DISTINCT chamber FROM members")
##   chamber
## 1  senate
## 2   house
dbGetQuery(con, "SELECT DISTINCT party FROM members")
##   party
## 1  <NA>
## 2     R
## 3     D
dbGetQuery(con, "SELECT DISTINCT account_type FROM accounts")
##   account_type
## 1       office
## 2     campaign

How many tweets are we dealing with?

dbGetQuery(con, "SELECT COUNT(*) FROM tweets")
##     count
## 1 4310843

There are a total of 4,310,843 tweets posted by U.S. Congress members from June 2017 to July 2023.

Data Analysis

Table of Contents:

  1. Tweet Count by Party and Year
  2. Tweet Count by Chamber and Year
  3. Top Tweeters by Year
  4. Top Hashtags
  5. Top Hashtags by Party
  6. Top Hashtags by Party and Year
  7. Top Hashtags by Chamber
  8. Top Words
  9. Top Words by Party
  10. Top Words by Party and Year
  11. Top Words by Chamber
  12. Sentiment Analysis by Party and Year
  13. Sentiment Analysis by Chamber and Year
  14. Sentiment Analysis by Topic and Party
  15. Top Accounts Retweeted
  16. Top Accounts Retweeted by Party
  17. Top Accounts Quoted
  18. Top Accounts Quoted by Party
  19. Top Accounts Mentioned
  20. Top Accounts Mentioned by Party

Tweet Count by Party and Year

Let’s start by counting the number of tweets by party affiliation of the tweet author.

tweet_count_by_party <- dbGetQuery(con, "SELECT party, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id GROUP BY party")
tweet_count_by_party$percentage <- tweet_count_by_party$count / sum(tweet_count_by_party$count) * 100
tweet_count_by_party
##   party   count percentage
## 1     D 2504716 58.1026959
## 2     R 1763583 40.9103973
## 3  <NA>   42544  0.9869067
pie(tweet_count_by_party$count, labels = paste0(c("Democratic Party", "Republican Party", "Independent"), " (", round(tweet_count_by_party$percentage, 2), "%)"), main = "Tweets of U.S. Congress Members, by Party of Tweet Author, 2017-2023", col = c(D_HEX, R_HEX, NA_HEX))

Democratic Congress members have tweeted the most from 2017 to 2023, accounting for nearly 60% of all tweets, followed by Republican members at around 40%. Independent members have tweeted the least at less than 1%.

Now let’s see how the number of tweets has evolved over time.

tweet_count_by_party_2018 <- dbGetQuery(con, "SELECT party, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2018-01-01' AND time < '2019-01-01' GROUP BY party")
tweet_count_by_party_2019 <- dbGetQuery(con, "SELECT party, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2019-01-01' AND time < '2020-01-01' GROUP BY party")
tweet_count_by_party_2020 <- dbGetQuery(con, "SELECT party, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2020-01-01' AND time < '2021-01-01' GROUP BY party")
tweet_count_by_party_2021 <- dbGetQuery(con, "SELECT party, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2021-01-01' AND time < '2022-01-01' GROUP BY party")
tweet_count_by_party_2022 <- dbGetQuery(con, "SELECT party, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2022-01-01' AND time < '2023-01-01' GROUP BY party")
tweet_count_by_party_2018$year <- 2018
tweet_count_by_party_2019$year <- 2019
tweet_count_by_party_2020$year <- 2020
tweet_count_by_party_2021$year <- 2021
tweet_count_by_party_2022$year <- 2022
tweet_count_by_party_ts <- rbind(tweet_count_by_party_2018, tweet_count_by_party_2019, tweet_count_by_party_2020, tweet_count_by_party_2021, tweet_count_by_party_2022)

tweet_count_ts <- aggregate(count ~ year, tweet_count_by_party_ts, sum)
tweet_count_ts
##   year  count
## 1 2018 638951
## 2 2019 721569
## 3 2020 882284
## 4 2021 738086
## 5 2022 721410
tweet_count_by_party_ts
##    party  count year
## 1      D 367343 2018
## 2      R 263391 2018
## 3   <NA>   8217 2018
## 4      D 454744 2019
## 5      R 256210 2019
## 6   <NA>  10615 2019
## 7      D 538642 2020
## 8      R 332261 2020
## 9   <NA>  11381 2020
## 10     D 412416 2021
## 11     R 321545 2021
## 12  <NA>   4125 2021
## 13     D 395843 2022
## 14     R 322335 2022
## 15  <NA>   3232 2022
ggplot(tweet_count_ts, aes(x = year, y = count)) +
  geom_point() +
  geom_line() +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Tweets of U.S. Congress Members, by Year, 2018-2022", x = "Year", y = "Number of Tweets") +
  theme_minimal()

ggplot(tweet_count_by_party_ts, aes(x = year, y = count, fill = party)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values = c(D_HEX, R_HEX, NA_HEX), labels = c("Democratic", "Republican", "Independent")) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Tweets of U.S. Congress Members, by Party of Tweet Author and Year, 2018-2022", x = "Year", y = "Number of Tweets", fill = "Party") +
  theme_minimal()

The line plot shows the number of tweets peaked in 2020, aka election year, following progressive growth from 2018. Notably, it did not fall back to 2018 levels after the election, suggesting that the members have continued to use X as a platform to communicate with the public after enthusiasm from the election.

The bar plot shows that the Democratic members have tweeted more than the Republican members in each year. Interestingly, the number of tweets from Republican members sustained at a significantly higher level than in 2018 following an abrupt increase in 2020, while the same cannot be said for the Democratic members, suggesting that the Republican members are the main drivers of the overall increase in the number of tweets after the election.

Tweet Count by Chamber and Year

Next, we do the same, but by chamber of the tweet author.

tweet_count_by_chamber <- dbGetQuery(con, "SELECT chamber, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id GROUP BY chamber")
tweet_count_by_chamber$per_seat_rate <- tweet_count_by_chamber$count / ifelse(tweet_count_by_chamber$chamber == "house", 435, 100)
tweet_count_by_chamber$percentage <- tweet_count_by_chamber$per_seat_rate / sum(tweet_count_by_chamber$per_seat_rate) * 100
tweet_count_by_chamber
##   chamber   count per_seat_rate percentage
## 1   house 3258025      7489.713   41.56819
## 2  senate 1052818     10528.180   58.43181
pie(tweet_count_by_chamber$per_seat_rate, labels = paste0(c("House", "Senate"), " (", round(tweet_count_by_chamber$percentage, 2), "%)"), main = "Tweets of U.S. Congress Members, by Chamber of Tweet Author, 2017-2023", col = c(HOUSE_HEX, SENATE_HEX))
title(sub = "Note: Per-seat figures are used to account for the difference in the number of seats between the House and the Senate")

Members of the Senate have tweeted more than members of the House on a per-seat basis by around 15%, indicating that Senate members are on average more active on Twitter than House members.

tweet_count_by_chamber_2018 <- dbGetQuery(con, "SELECT chamber, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2018-01-01' AND time < '2019-01-01' GROUP BY chamber")
tweet_count_by_chamber_2019 <- dbGetQuery(con, "SELECT chamber, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2019-01-01' AND time < '2020-01-01' GROUP BY chamber")
tweet_count_by_chamber_2020 <- dbGetQuery(con, "SELECT chamber, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2020-01-01' AND time < '2021-01-01' GROUP BY chamber")
tweet_count_by_chamber_2021 <- dbGetQuery(con, "SELECT chamber, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2021-01-01' AND time < '2022-01-01' GROUP BY chamber")
tweet_count_by_chamber_2022 <- dbGetQuery(con, "SELECT chamber, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2022-01-01' AND time < '2023-01-01' GROUP BY chamber")
tweet_count_by_chamber_2018$year <- 2018
tweet_count_by_chamber_2019$year <- 2019
tweet_count_by_chamber_2020$year <- 2020
tweet_count_by_chamber_2021$year <- 2021
tweet_count_by_chamber_2022$year <- 2022
tweet_count_by_chamber_ts <- rbind(tweet_count_by_chamber_2018, tweet_count_by_chamber_2019, tweet_count_by_chamber_2020, tweet_count_by_chamber_2021, tweet_count_by_chamber_2022)
tweet_count_by_chamber_ts$per_seat_rate <- tweet_count_by_chamber_ts$count / ifelse(tweet_count_by_chamber_ts$chamber == "house", 435, 100)
tweet_count_by_chamber_ts
##    chamber  count year per_seat_rate
## 1    house 453734 2018      1043.067
## 2   senate 185217 2018      1852.170
## 3    house 546119 2019      1255.446
## 4   senate 175450 2019      1754.500
## 5    house 664733 2020      1528.122
## 6   senate 217551 2020      2175.510
## 7    house 584241 2021      1343.083
## 8   senate 153845 2021      1538.450
## 9    house 559701 2022      1286.669
## 10  senate 161709 2022      1617.090
ggplot(tweet_count_by_chamber_ts, aes(x = year, y = per_seat_rate, fill = chamber)) +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values = c(HOUSE_HEX, SENATE_HEX), labels = c("House", "Senate")) +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Tweets of U.S. Congress Members, by Chamber of Tweet Author and Year, 2018-2022", x = "Year", y = "Number of Tweets per Seat", fill = "Chamber", subtitle = "Note: Per-seat figures are used to account for the difference in the number of seats between the House and the Senate") +
  theme_minimal()

Members of the Senate has tweeted more than members of the House on a per-seat basis in each year, which is consistent with the previous observation. The number of tweets from both chambers peaked in 2020, but the increase was more pronounced in the Senate, suggesting that the Senate members showed more enthusiasm in using X as a platform to communicate with the public during the election.

Top Tweeters by Year

Moving on, let’s see some names. Who are the top tweeters in each year?

tweet_count_max_2018 <- dbGetQuery(con, "SELECT name, chamber, party, account_type, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2018-01-01' AND time < '2019-01-01' GROUP BY name, chamber, party, account_type ORDER BY count DESC LIMIT 10")
tweet_count_max_2019 <- dbGetQuery(con, "SELECT name, chamber, party, account_type, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2019-01-01' AND time < '2020-01-01' GROUP BY name, chamber, party, account_type ORDER BY count DESC LIMIT 10")
tweet_count_max_2020 <- dbGetQuery(con, "SELECT name, chamber, party, account_type, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2020-01-01' AND time < '2021-01-01' GROUP BY name, chamber, party, account_type ORDER BY count DESC LIMIT 10")
tweet_count_max_2021 <- dbGetQuery(con, "SELECT name, chamber, party, account_type, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2021-01-01' AND time < '2022-01-01' GROUP BY name, chamber, party, account_type ORDER BY count DESC LIMIT 10")
tweet_count_max_2022 <- dbGetQuery(con, "SELECT name, chamber, party, account_type, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2022-01-01' AND time < '2023-01-01' GROUP BY name, chamber, party, account_type ORDER BY count DESC LIMIT 10")
tweet_count_max_2018$year <- 2018
tweet_count_max_2019$year <- 2019
tweet_count_max_2020$year <- 2020
tweet_count_max_2021$year <- 2021
tweet_count_max_2022$year <- 2022
tweet_count_max_ts <- rbind(tweet_count_max_2018, tweet_count_max_2019, tweet_count_max_2020, tweet_count_max_2021, tweet_count_max_2022)
tweet_count_max_ts
##                   name chamber party account_type count year
## 1           Billy Long   house     R     campaign 19672 2018
## 2            Lacy Clay   house     D     campaign  8779 2018
## 3            Don Beyer   house     D       office  6004 2018
## 4      Pramila Jayapal   house     D       office  5087 2018
## 5             Ted Lieu   house     D     campaign  4217 2018
## 6            Ro Khanna   house     D     campaign  4187 2018
## 7          John Cornyn  senate     R       office  4035 2018
## 8          Orrin Hatch  senate     R       office  3954 2018
## 9     Marsha Blackburn  senate     R     campaign  3862 2018
## 10 Ileana Ros-Lehtinen   house     R       office  3730 2018
## 11          Billy Long   house     R     campaign 13196 2019
## 12            Chip Roy   house     R     campaign 10978 2019
## 13           Lacy Clay   house     D     campaign  6005 2019
## 14           Don Beyer   house     D       office  5922 2019
## 15      Joaquin Castro   house     D     campaign  5828 2019
## 16          Andy Biggs   house     R       office  5268 2019
## 17    Elizabeth Warren  senate     D     campaign  4830 2019
## 18     Pramila Jayapal   house     D       office  4521 2019
## 19      Bernie Sanders  senate  <NA>     campaign  4498 2019
## 20         John Cornyn  senate     R       office  4465 2019
## 21            Chip Roy   house     R     campaign 14191 2020
## 22          Billy Long   house     R     campaign 14173 2020
## 23         John Cornyn  senate     R       office  8685 2020
## 24          Andy Biggs   house     R       office  7290 2020
## 25        Dwight Evans   house     D       office  6716 2020
## 26           Don Beyer   house     D       office  6154 2020
## 27            Ted Cruz  senate     R     campaign  4859 2020
## 28        Tim Burchett   house     R     campaign  4462 2020
## 29     Pramila Jayapal   house     D       office  4446 2020
## 30       Ruben Gallego   house     D     campaign  4259 2020
## 31           Don Beyer   house     D       office  5878 2021
## 32         John Cornyn  senate     R       office  5762 2021
## 33      Claudia Tenney   house     R     campaign  5451 2021
## 34        Dwight Evans   house     D       office  5290 2021
## 35     Pramila Jayapal   house     D       office  4930 2021
## 36          Billy Long   house     R     campaign  4819 2021
## 37            Ted Cruz  senate     R     campaign  4338 2021
## 38       Thomas Massie   house     R       office  3907 2021
## 39         Sean Casten   house     D       office  3895 2021
## 40           Don Bacon   house     R       office  3792 2021
## 41          Billy Long   house     R     campaign 10170 2022
## 42            Chip Roy   house     R     campaign  7354 2022
## 43       Bill Pascrell   house     D       office  5972 2022
## 44            Ted Cruz  senate     R     campaign  5466 2022
## 45         John Cornyn  senate     R       office  5367 2022
## 46       Eric Swalwell   house     D     campaign  5093 2022
## 47           Don Bacon   house     R     campaign  4751 2022
## 48           Don Beyer   house     D       office  4535 2022
## 49        Dwight Evans   house     D       office  4499 2022
## 50      Adam Kinzinger   house     R     campaign  3915 2022
ggplot(tweet_count_max_ts, aes(x = count, y = reorder_within(name, count, year))) +
  geom_col(aes(fill = party)) +
  geom_col_pattern(aes(pattern = chamber), pattern_key_scale_factor = 0.5, fill = NA, color = "white", pattern_color = "white") +
  facet_wrap(~year, scales = "free_y") +
  labs(title = "Top 10 Congress Members with the Most Tweets, by Year", x = "Number of Tweets", y = "Congress Member", fill = "Party", pattern = "Chamber") +
  scale_fill_manual(values = c(D_HEX, R_HEX, NA_HEX), labels = c("Democratic", "Republican", "Independent")) +
  scale_pattern_manual(values = c("stripe", "crosshatch"), labels = c("House", "Senate")) +
  scale_x_continuous(labels = scales::comma) +
  scale_y_reordered(labels = function(x) gsub("___\\d{4}", "", x)) +
  theme_minimal()

The top tweeters are evenly distributed between the two parties, with representation from both chambers that are consistent with the ratio of seats. We can see that the top tweeters are largely the same in each year, suggesting that some Congress members are consistently using X as a major platform to communicate with the public or deliver campaign messages, such as Rep. Billy Long (R), Rep. Don Beyer (D), Sen. John Cornyn (R). Notably, Texas senator Ted Cruz (R) appeared in the list in 2020 and has stayed in the top 10 since.

Top Hashtags

hashtags_top_30 <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets) AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 30")
hashtags_top_30$rank <- rank(-hashtags_top_30$count)
hashtags_top_30
##                hashtag count rank
## 1             #COVID19 61091    1
## 2        #ForThePeople 17855    2
## 3                #SOTU 14997    3
## 4         #coronavirus 13335    4
## 5     #BuildBackBetter 13177    5
## 6  #AmericanRescuePlan 10769    6
## 7          #GOPTaxScam  9933    7
## 8              #SCOTUS  8430    8
## 9      #ProtectOurCare  8116    9
## 10          #TaxReform  7778   10
## 11               #DACA  7690   11
## 12     #EndGunViolence  7129   12
## 13      #NetNeutrality  6937   13
## 14         #2020Census  6893   14
## 15  #BlackHistoryMonth  6870   15
## 16      #TrumpShutdown  6804   16
## 17     #ChildTaxCredit  6770   17
## 18              #utpol  6637   18
## 19               #NY21  6545   19
## 20         #PuertoRico  6490   20
## 21              #gapol  6117   21
## 22         #GetCovered  5922   22
## 23  #BidenBorderCrisis  5899   23
## 24              #USMCA  5859   24
## 25               #AZ01  5781   25
## 26              #ICYMI  5632   26
## 27           #BREAKING  5445   27
## 28                #ACA  5409   28
## 29 #WomensHistoryMonth  5257   29
## 30               #MAGA  4999   30
ggplot(hashtags_top_30, aes(x = count, y = reorder(hashtag, count))) +
  geom_col() +
  geom_text(aes(label = paste0("#", rank)), hjust = -0.3) +
  labs(title = "Top 30 Most Common Hashtags Used by Congress Members in Tweets, 2017-2023", x = "Number of Tweets", y = "Hashtag") +
  scale_x_continuous(labels = scales::comma) +
  theme_minimal()

The most common hashtag used by Congress members is “#COVID19”, followed by “#ForThePeople” and “#SOTU”. COVID-19 was a major event that affected the world in 2020 and 2021, so it is not surprising that it was the most common hashtag used by Congress members. The hashtag “#ForThePeople” is related to the Democratic Party’s slogan for the 2020 election, while “#SOTU” is related to the annual State of the Union address by the President. Overall, hashtags used by Democratic members seem to be more prevalent than those used by Republican members in the top 30.

Top Hashtags by Party

Let’s also analyze with respect to party.

hashtags_top_30_d <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 30")
hashtags_top_30_r <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 30")
hashtags_top_30_d$party <- "Democratic Party"
hashtags_top_30_d$rank <- rank(-hashtags_top_30_d$count)
hashtags_top_30_r$party <- "Republican Party"
hashtags_top_30_r$rank <- rank(-hashtags_top_30_r$count)
hashtags_top_30_by_party <- rbind(hashtags_top_30_d, hashtags_top_30_r)
hashtags_top_30_by_party
##                       hashtag count            party rank
## 1                    #COVID19 42401 Democratic Party    1
## 2               #ForThePeople 17744 Democratic Party    2
## 3            #BuildBackBetter 12997 Democratic Party    3
## 4         #AmericanRescuePlan 10719 Democratic Party    4
## 5                 #GOPTaxScam  9911 Democratic Party    5
## 6                       #SOTU  8218 Democratic Party    6
## 7             #ProtectOurCare  8095 Democratic Party    7
## 8                #coronavirus  7254 Democratic Party    8
## 9             #EndGunViolence  7123 Democratic Party    9
## 10                      #DACA  6839 Democratic Party   10
## 11             #TrumpShutdown  6793 Democratic Party   11
## 12             #NetNeutrality  6748 Democratic Party   12
## 13            #ChildTaxCredit  6673 Democratic Party   13
## 14                #2020Census  6511 Democratic Party   14
## 15         #BlackHistoryMonth  6100 Democratic Party   15
## 16                #GetCovered  5898 Democratic Party   16
## 17                      #AZ01  5719 Democratic Party   17
## 18                       #ACA  5329 Democratic Party   18
## 19                    #SCOTUS  5187 Democratic Party   19
## 20              #ActOnClimate  4624 Democratic Party   20
## 21        #WomensHistoryMonth  4358 Democratic Party   21
## 22                 #HeroesAct  4320 Democratic Party   22
## 23               #EqualityAct  4290 Democratic Party   23
## 24                  #Dreamers  3924 Democratic Party   24
## 25                       #HR1  3784 Democratic Party   25
## 26                     #LGBTQ  3643 Democratic Party   26
## 27                       #HR8  3569 Democratic Party   27
## 28                #PuertoRico  3463 Democratic Party   28
## 29                     #Trump  3419 Democratic Party   29
## 30            #MedicareForAll  3221 Democratic Party   30
## 31                   #COVID19 18448 Republican Party    1
## 32                 #TaxReform  7309 Republican Party    2
## 33                      #SOTU  6708 Republican Party    3
## 34                      #NY21  6511 Republican Party    4
## 35               #coronavirus  6034 Republican Party    5
## 36         #BidenBorderCrisis  5869 Republican Party    6
## 37                     #utpol  5579 Republican Party    7
## 38                     #gapol  5383 Republican Party    8
## 39                     #USMCA  5090 Republican Party    9
## 40                      #MAGA  4740 Republican Party   10
## 41                 #taxreform  4594 Republican Party   11
## 42                     #China  4226 Republican Party   12
## 43         #TaxCutsandJobsAct  3383 Republican Party   13
## 44                 #Venezuela  3322 Republican Party   14
## 45                      #Ohio  3254 Republican Party   15
## 46                    #SCOTUS  3235 Republican Party   16
## 47                       #CCP  3171 Republican Party   17
## 48                     #ICYMI  3041 Republican Party   18
## 49                #PuertoRico  3019 Republican Party   19
## 50                      #AZ05  2777 Republican Party   20
## 51                   #Florida  2745 Republican Party   21
## 52 #PaycheckProtectionProgram  2659 Republican Party   22
## 53                      #Cuba  2614 Republican Party   23
## 54                  #Kentucky  2601 Republican Party   24
## 55                  #BREAKING  2600 Republican Party   25
## 56              #Bidenflation  2569 Republican Party   26
## 57                      #NE02  2514 Republican Party   27
## 58         #StandUpForAmerica  2484 Republican Party   28
## 59                         #1  2304 Republican Party   29
## 60                   #ProLife  2159 Republican Party   30
ggplot(hashtags_top_30_by_party, aes(x = count, y = reorder_within(hashtag, -rank, party))) +
  geom_col(aes(fill = party)) +
  geom_text(aes(label = paste0("#", rank)), hjust = c(1.3, rep(-0.3, 29), 1.3, rep(-0.3, 29))) +
  facet_wrap(~party, scales = "free") +
  labs(title = "Top 30 Most Common Hashtags Used by Congress Members in Tweets, by Party, 2017-2023", x = "Number of Tweets", y = "Hashtag") +
  scale_fill_manual(values = c(D_HEX, R_HEX)) +
  scale_x_continuous(labels = scales::comma) +
  scale_y_reordered(labels = function(x) gsub("___.*", "", x)) +
  theme_minimal() +
  theme(legend.position = "none")

The hashtags reveal the priorities and strategies of the two parties. While both parties have used platform slogans as hashtags, such as “#ForThePeople” and “#BuildBackBetter” by Democrats and “#MAGA” and “#StandUpForAmerica” by Republicans, their other hashtags often relate to their stance on current issues. For example, Democrats have used hashtags related to healthcare, climate change, and gun control, while Republicans have used hashtags related to the border, tax cuts, and the economy.

It is also interesting to note the difference in strategies between the two parties, with Democrats primarily using catchy slogans as hashtags, while Republicans uses hashtags that are more vague and open to interpretation. For example, looking at hashtags by democratic members, most of them are “#EndGunViolence”, “#NetNeutrality”, “#ChildTaxCredit”, “#BlackHistoryMonth”, “#GetCovered”, and more. These hashtags strongly hint at the party’s policy goals and stance on issues. Meanwhile, looking at hashtags by republican members, a lot of them are “#China”, “#Ohio”, “#Cuba”, “#NY21”, etc, which are meaningless without context. This suggests that Republicans are using hashtags more as a way to draw attention to specific events rather than to promote their policy goals.

Top Hashtags by Party and Year

Let’s consider time as well.

hashtags_top_10_d_2018 <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2018-01-01' AND time < '2019-01-01') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 10")
hashtags_top_10_r_2018 <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2018-01-01' AND time < '2019-01-01') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 10")
hashtags_top_10_d_2019 <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2019-01-01' AND time < '2020-01-01') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 10")
hashtags_top_10_r_2019 <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2019-01-01' AND time < '2020-01-01') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 10")
hashtags_top_10_d_2020 <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2020-01-01' AND time < '2021-01-01') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 10")
hashtags_top_10_r_2020 <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2020-01-01' AND time < '2021-01-01') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 10")
hashtags_top_10_d_2021 <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2021-01-01' AND time < '2022-01-01') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 10")
hashtags_top_10_r_2021 <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2021-01-01' AND time < '2022-01-01') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 10")
hashtags_top_10_d_2022 <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2022-01-01' AND time < '2023-01-01') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 10")
hashtags_top_10_r_2022 <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2022-01-01' AND time < '2023-01-01') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 10")
hashtags_top_10_d_2018$year <- 2018
hashtags_top_10_r_2018$year <- 2018
hashtags_top_10_d_2019$year <- 2019
hashtags_top_10_r_2019$year <- 2019
hashtags_top_10_d_2020$year <- 2020
hashtags_top_10_r_2020$year <- 2020
hashtags_top_10_d_2021$year <- 2021
hashtags_top_10_r_2021$year <- 2021
hashtags_top_10_d_2022$year <- 2022
hashtags_top_10_r_2022$year <- 2022
hashtags_top_10_d_2018$party <- "D"
hashtags_top_10_r_2018$party <- "R"
hashtags_top_10_d_2019$party <- "D"
hashtags_top_10_r_2019$party <- "R"
hashtags_top_10_d_2020$party <- "D"
hashtags_top_10_r_2020$party <- "R"
hashtags_top_10_d_2021$party <- "D"
hashtags_top_10_r_2021$party <- "R"
hashtags_top_10_d_2022$party <- "D"
hashtags_top_10_r_2022$party <- "R"
hashtags_top_10_by_party_and_year <- rbind(hashtags_top_10_d_2018, hashtags_top_10_r_2018, hashtags_top_10_d_2019, hashtags_top_10_r_2019, hashtags_top_10_d_2020, hashtags_top_10_r_2020, hashtags_top_10_d_2021, hashtags_top_10_r_2021, hashtags_top_10_d_2022, hashtags_top_10_r_2022)
library(ggh4x)
hashtags_top_10_by_party_and_year
##                        hashtag count year party
## 1                  #GOPTaxScam  3753 2018     D
## 2               #NetNeutrality  3639 2018     D
## 3               #TrumpShutdown  2477 2018     D
## 4      #FamiliesBelongTogether  2315 2018     D
## 5                        #SOTU  2168 2018     D
## 6                        #DACA  1988 2018     D
## 7                #ForThePeople  1923 2018     D
## 8                       #nmpol  1704 2018     D
## 9              #ProtectOurCare  1653 2018     D
## 10                      #mtpol  1582 2018     D
## 11                  #TaxReform  3194 2018     R
## 12          #TaxCutsandJobsAct  2711 2018     R
## 13                      #utpol  2124 2018     R
## 14                       #SOTU  2009 2018     R
## 15                  #taxreform  2000 2018     R
## 16               #BetterOffNow  1984 2018     R
## 17                       #MAGA  1663 2018     R
## 18            #SchumerShutdown  1562 2018     R
## 19                       #VA10  1478 2018     R
## 20                     #SCOTUS  1162 2018     R
## 21               #ForThePeople  7972 2019     D
## 22              #TrumpShutdown  4312 2019     D
## 23             #ProtectOurCare  3287 2019     D
## 24             #EndGunViolence  2667 2019     D
## 25                        #HR8  2590 2019     D
## 26         #DefendOurDemocracy  2205 2019     D
## 27                #EqualityAct  2068 2019     D
## 28               #ActOnClimate  2018 2019     D
## 29                        #HR1  2009 2019     D
## 30                       #SOTU  1949 2019     D
## 31                      #USMCA  3501 2019     R
## 32                  #Venezuela  1823 2019     R
## 33                       #SOTU  1767 2019     R
## 34                       #Ohio  1064 2019     R
## 35                       #MAGA   956 2019     R
## 36                   #USMCAnow   952 2019     R
## 37               #GreenNewDeal   816 2019     R
## 38                       #AZ05   781 2019     R
## 39                       #NY21   745 2019     R
## 40                      #lagov   739 2019     R
## 41                    #COVID19 32926 2020     D
## 42                #coronavirus  7178 2020     D
## 43                 #2020Census  4905 2020     D
## 44                  #HeroesAct  4319 2020     D
## 45               #ForThePeople  3585 2020     D
## 46                #Coronavirus  1981 2020     D
## 47          #JusticeInPolicing  1873 2020     D
## 48              #FamiliesFirst  1850 2020     D
## 49           #BlackLivesMatter  1700 2020     D
## 50                   #CARESAct  1698 2020     D
## 51                    #COVID19 15483 2020     R
## 52                #coronavirus  5935 2020     R
## 53                      #gapol  4007 2020     R
## 54  #PaycheckProtectionProgram  2563 2020     R
## 55                      #gasen  1960 2020     R
## 56                        #PPP  1935 2020     R
## 57                      #China  1741 2020     R
## 58                       #NY21  1666 2020     R
## 59                      #GAsen  1598 2020     R
## 60                       #SOTU  1506 2020     R
## 61            #BuildBackBetter 11599 2021     D
## 62         #AmericanRescuePlan  8716 2021     D
## 63                    #COVID19  8325 2021     D
## 64             #ChildTaxCredit  5435 2021     D
## 65               #ForThePeople  3235 2021     D
## 66              #StopAsianHate  1535 2021     D
## 67         #BuildBackBetterAct  1446 2021     D
## 68                #EqualityAct  1272 2021     D
## 69          #BlackHistoryMonth  1261 2021     D
## 70                        #HR1  1135 2021     D
## 71          #BidenBorderCrisis  3090 2021     R
## 72                    #COVID19  2616 2021     R
## 73                    #SOSCuba  1533 2021     R
## 74                       #IA02  1380 2021     R
## 75                       #Cuba  1274 2021     R
## 76                        #CCP  1006 2021     R
## 77                #Afghanistan   959 2021     R
## 78                       #IA01   928 2021     R
## 79                       #NY21   753 2021     R
## 80                      #utpol   752 2021     R
## 81      #InflationReductionAct  2443 2022     D
## 82         #AmericanRescuePlan  1843 2022     D
## 83                       #SOTU  1516 2022     D
## 84          #BlackHistoryMonth  1425 2022     D
## 85          #CancelStudentDebt  1363 2022     D
## 86         #PeopleOverPolitics  1357 2022     D
## 87                    #COVID19  1130 2022     D
## 88            #BuildBackBetter  1086 2022     D
## 89             #EndGunViolence  1070 2022     D
## 90         #WomensHistoryMonth   970 2022     D
## 91          #BidenBorderCrisis  2059 2022     R
## 92               #Bidenflation  1785 2022     R
## 93                       #NY21  1544 2022     R
## 94                    #Ukraine  1263 2022     R
## 95        #CommitmentToAmerica  1176 2022     R
## 96                       #SOTU   903 2022     R
## 97          #StandUpForAmerica   752 2022     R
## 98                        #CCP   621 2022     R
## 99                       #OH01   569 2022     R
## 100                      #IA04   564 2022     R
ggplot(hashtags_top_10_by_party_and_year, aes(x = count, y = reorder_within(hashtag, count, interaction(year, party)), fill = party)) +
  geom_col() +
  facet_grid2(party ~ year, scales = "free", independent = "y", labeller = as_labeller(c("D" = "Democratic Party", "R" = "Republican Party", "2018" = "2018", "2019" = "2019", "2020" = "2020", "2021" = "2021", "2022" = "2022"))) +
  labs(title = "Top 10 Most Common Hashtags Used by Congress Members in Tweets, by Party and Year, 2018-2022", x = "Number of Tweets", y = "Hashtag", fill = "Party") +
  scale_fill_manual(values = c(D_HEX, R_HEX)) +
  scale_x_continuous(labels = scales::comma) +
  scale_y_reordered(labels = function(x) gsub("___\\d{4}\\.(D|R)", "", x)) +
  theme_minimal() +
  theme(legend.position = "none")

In 2018 and 2019, The Democratic members have used hashtags to criticize the Trump administration and promote their policies regarding healthcare, equality and climate change, while the Republican members have used hashtags to support Trump and promote their policies regarding tax reform and the border in the same period.

In 2020, both parties have used hashtags related to the COVID-19 pandemic and the election, but the Democratic members have also used hashtags to raise awareness about social justice issues, particularly the Black Lives Matter movement, while the Republican members have used hashtags about the economy.

In 2021, the Democratic members supported the Biden administration’s policy regarding the pandemic and economy, and continued support on social justice issues, while the Republican members have shifted focus on the border and international developments, such as the esclations in Cuba and Afghanistan.

In 2022, the Democratic members have used hashtags regarding gun violence, cancelling student debt, as well as continued support on social justice issues, while the Republican members have used hashtags regarding the border, the economy, and the situation in Ukraine.

Top Hashtag by Chamber

Let’s also analyze with respect to chamber.

hashtags_top_30_h <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE chamber = 'house') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 30")
hashtags_top_30_s <- dbGetQuery(con, "SELECT hashtag, COUNT(*) FROM (SELECT unnest(hashtags) AS hashtag FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE chamber = 'senate') AS hashtags GROUP BY hashtag ORDER BY count DESC LIMIT 30")
hashtags_top_30_h$chamber <- "House of Representatives"
hashtags_top_30_h$rank <- rank(-hashtags_top_30_h$count)
hashtags_top_30_s$chamber <- "Senate"
hashtags_top_30_s$rank <- rank(-hashtags_top_30_s$count)
hashtags_top_30_by_chamber <- rbind(hashtags_top_30_h, hashtags_top_30_s)
hashtags_top_30_by_chamber
##                hashtag count                  chamber rank
## 1             #COVID19 47768 House of Representatives    1
## 2        #ForThePeople 17168 House of Representatives    2
## 3                #SOTU 12363 House of Representatives    3
## 4     #BuildBackBetter 11673 House of Representatives    4
## 5         #coronavirus 10296 House of Representatives    5
## 6  #AmericanRescuePlan  9763 House of Representatives    6
## 7          #GOPTaxScam  8490 House of Representatives    7
## 8      #EndGunViolence  6777 House of Representatives    8
## 9      #ProtectOurCare  6728 House of Representatives    9
## 10               #NY21  6543 House of Representatives   10
## 11         #2020Census  6197 House of Representatives   11
## 12               #DACA  6026 House of Representatives   12
## 13     #ChildTaxCredit  5884 House of Representatives   13
## 14         #PuertoRico  5871 House of Representatives   14
## 15               #AZ01  5766 House of Representatives   15
## 16  #BlackHistoryMonth  5520 House of Representatives   16
## 17      #TrumpShutdown  5493 House of Representatives   17
## 18             #SCOTUS  5371 House of Representatives   18
## 19          #TaxReform  5211 House of Representatives   19
## 20           #BREAKING  4703 House of Representatives   20
## 21         #GetCovered  4667 House of Representatives   21
## 22              #USMCA  4637 House of Representatives   22
## 23              #utpol  4539 House of Representatives   23
## 24  #BidenBorderCrisis  4464 House of Representatives   24
## 25                #HR1  4397 House of Representatives   25
## 26          #HeroesAct  4258 House of Representatives   26
## 27 #WomensHistoryMonth  4243 House of Representatives   27
## 28               #MAGA  4238 House of Representatives   28
## 29      #NetNeutrality  4225 House of Representatives   29
## 30              #ICYMI  4198 House of Representatives   30
## 31            #COVID19 13323                   Senate    1
## 32              #gapol  4360                   Senate    2
## 33             #SCOTUS  3059                   Senate    3
## 34        #coronavirus  3039                   Senate    4
## 35               #Ohio  2829                   Senate    5
## 36      #NetNeutrality  2712                   Senate    6
## 37              #mtpol  2665                   Senate    7
## 38               #SOTU  2634                   Senate    8
## 39          #TaxReform  2567                   Senate    9
## 40           #Kentucky  2538                   Senate   10
## 41              #China  2235                   Senate   11
## 42              #utpol  2098                   Senate   12
## 43          #Venezuela  2037                   Senate   13
## 44              #gasen  2013                   Senate   14
## 45                 #WV  1783                   Senate   15
## 46            #Florida  1755                   Senate   16
## 47               #DACA  1664                   Senate   17
## 48              #mtsen  1658                   Senate   18
## 49              #GAsen  1594                   Senate   19
## 50    #BuildBackBetter  1504                   Senate   20
## 51          #taxreform  1477                   Senate   21
## 52         #GOPTaxScam  1443                   Senate   22
## 53  #BidenBorderCrisis  1435                   Senate   23
## 54              #ICYMI  1434                   Senate   24
## 55          #DemDebate  1401                   Senate   25
## 56              #MOSen  1400                   Senate   26
## 57     #ProtectOurCare  1388                   Senate   27
## 58             #Senate  1384                   Senate   28
## 59   #99countymeetings  1363                   Senate   29
## 60  #BlackHistoryMonth  1350                   Senate   30
ggplot(hashtags_top_30_by_chamber, aes(x = count, y = reorder_within(hashtag, -rank, chamber))) +
  geom_col(aes(fill = chamber)) +
  geom_text(aes(label = paste0("#", rank)), hjust = c(1.3, rep(-0.3, 29), 1.3, rep(-0.3, 29))) +
  facet_wrap(~chamber, scales = "free") +
  labs(title = "Top 30 Most Common Hashtags Used by Congress Members in Tweets, by Chamber, 2017-2023", x = "Number of Tweets", y = "Hashtag") +
  scale_fill_manual(values = c(HOUSE_HEX, SENATE_HEX)) +
  scale_x_continuous(labels = scales::comma) +
  scale_y_reordered(labels = function(x) gsub("___.*", "", x)) +
  theme_minimal() +
  theme(legend.position = "none")

Top Words

Next, we will analyze the most common words used by Congress members in their tweets.

# >>> ACTION REQUIRED: Please uncomment the following line and remove the workaround below
# tokens_top_30 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets) AS tokens GROUP BY word ORDER BY count DESC LIMIT 30")

# Workaround for a bug in process_text.py. I've fixed it so you won't need this anymore. You can safely remove this workaround, but you need to uncomment the code above.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
##     filter, lag
## The following objects are masked from 'package:base':
##
##     intersect, setdiff, setequal, union
tokens_top_30 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets) AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(30)
# End of workaround

tokens_top_30$rank <- rank(-tokens_top_30$count)
tokens_top_30
##           word  count rank
## 1       people 344460    1
## 2        trump 295509    2
## 3          act 286611    3
## 4    president 284337    4
## 5     american 261957    5
## 6        house 258124    6
## 7         time 248561    7
## 8       health 242520    8
## 9         bill 238280    9
## 10    congress 235081   10
## 11   americans 234255   11
## 12     support 232138   12
## 13         day 223232   13
## 14       biden 218421   14
## 15    families 213354   15
## 16     country 208284   16
## 17        care 193531   17
## 18     america 178198   18
## 19       proud 177988   19
## 20      senate 173809   20
## 21        vote 171776   21
## 22       women 159371   22
## 23   community 155184   23
## 24     federal 145078   24
## 25   democrats 145043   25
## 26        week 143063   26
## 27     protect 142032   27
## 28 communities 140943   28
## 29      nation 139927   29
## 30    continue 138902   30
ggplot(tokens_top_30, aes(x = count, y = reorder(word, count))) +
  geom_col() +
  geom_text(aes(label = paste0("#", rank)), hjust = -0.3) +
  labs(title = "Top 30 Most Common Words Used by Congress Members in Tweets, 2017-2023", x = "Number of Occurrences", y = "Word") +
  scale_x_continuous(labels = scales::comma) +
  theme_minimal()

The most common words used by Congress members in their tweets are “people”, “trump”, “act”, “president”, “american”, and etc. These words are general and do not provide much insight into the content of the tweets. However, we can see that the word “trump” is one of the most common, which suggests that the former president Donald Trump was a major topic of discussion among Congress members during his time in office.

Top Words by Party

Let’s also analyze with respect to party.

# >>> ACTION REQUIRED: Please uncomment the following lines and remove the workaround below
# tokens_top_30_d <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D') AS tokens GROUP BY word ORDER BY count DESC LIMIT 30")
# tokens_top_30_r <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R') AS tokens GROUP BY word ORDER BY count DESC LIMIT 30")

# Workaround for a bug in process_text.py. I've fixed it so you won't need this anymore. You can safely remove this workaround, but you need to uncomment the code above.
tokens_top_30_d <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(30)
tokens_top_30_r <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(30)
# End of workaround

tokens_top_30_d$party <- "Democratic Party"
tokens_top_30_d$rank <- rank(-tokens_top_30_d$count)
tokens_top_30_r$party <- "Republican Party"
tokens_top_30_r$rank <- rank(-tokens_top_30_r$count)
tokens_top_30_by_party <- rbind(tokens_top_30_d, tokens_top_30_r)
tokens_top_30_by_party
##           word  count            party rank
## 1        trump 236189 Democratic Party    1
## 2       people 225227 Democratic Party    2
## 3          act 200323 Democratic Party    3
## 4       health 194126 Democratic Party    4
## 5    president 165742 Democratic Party    5
## 6        house 156304 Democratic Party    6
## 7         time 155262 Democratic Party    7
## 8     congress 153866 Democratic Party    8
## 9     families 152459 Democratic Party    9
## 10        care 152418 Democratic Party   10
## 11     support 142371 Democratic Party   11
## 12        bill 142153 Democratic Party   12
## 13   americans 140463 Democratic Party   13
## 14    american 139352 Democratic Party   14
## 15         day 134343 Democratic Party   15
## 16     country 131589 Democratic Party   16
## 17        vote 119440 Democratic Party   17
## 18       proud 118659 Democratic Party   18
## 19       women 115740 Democratic Party   19
## 20   community 112359 Democratic Party   20
## 21      senate 110183 Democratic Party   21
## 22 communities 102637 Democratic Party   22
## 23     protect  99892 Democratic Party   23
## 24       fight  94356 Democratic Party   24
## 25     federal  93832 Democratic Party   25
## 26        join  90213 Democratic Party   26
## 27     america  88892 Democratic Party   27
## 28      public  86985 Democratic Party   28
## 29      rights  86401 Democratic Party   29
## 30    continue  86179 Democratic Party   30
## 31       biden 170937 Republican Party    1
## 32    american 120359 Republican Party    2
## 33   president 116314 Republican Party    3
## 34      people 113273 Republican Party    4
## 35       house 100287 Republican Party    5
## 36      border  98775 Republican Party    6
## 37        bill  93545 Republican Party    7
## 38   americans  91526 Republican Party    8
## 39   democrats  90945 Republican Party    9
## 40        time  90559 Republican Party   10
## 41     america  87434 Republican Party   11
## 42     support  87298 Republican Party   12
## 43         day  86907 Republican Party   13
## 44         act  84121 Republican Party   14
## 45    congress  78953 Republican Party   15
## 46     country  73466 Republican Party   16
## 47      nation  62459 Republican Party   17
## 48    national  61885 Republican Party   18
## 49      senate  61778 Republican Party   19
## 50        week  60275 Republican Party   20
## 51    families  59122 Republican Party   21
## 52       proud  58101 Republican Party   22
## 53       trump  55816 Republican Party   23
## 54         tax  55486 Republican Party   24
## 55         law  51769 Republican Party   25
## 56    continue  51726 Republican Party   26
## 57      energy  51666 Republican Party   27
## 58      crisis  51022 Republican Party   28
## 59        vote  50284 Republican Party   29
## 60     federal  49879 Republican Party   30
ggplot(tokens_top_30_by_party, aes(x = count, y = reorder_within(word, count, party))) +
  geom_col(aes(fill = party)) +
  geom_text(aes(label = paste0("#", rank)), hjust = c(1.3, rep(-0.3, 29), 1.3, rep(-0.3, 29))) +
  facet_wrap(~party, scales = "free") +
  labs(title = "Top 30 Most Common Words Used by Congress Members in Tweets, by Party, 2017-2023", x = "Number of Occurrences", y = "Word") +
  scale_fill_manual(values = c(D_HEX, R_HEX)) +
  scale_x_continuous(labels = scales::comma) +
  scale_y_reordered(labels = function(x) gsub("___.*", "", x)) +
  theme_minimal() +
  theme(legend.position = "none")

The most common words used by Democratic members in their tweets are “trump”, “people”, “act”, “health”, “president”, and etc. Notably ones down the list are “women”, “community/communities”, “protect”, “fight”, “public”, and “rights”. These words suggest that Democratic members discussed majorly about the former president Donald Trump, as well as current issues such as healthcare and rights of women and other communities.

On the other hand, the most common words used by Republican members in their tweets are “biden”, “american”, “president”, “people”, “house”, and etc. Again, noteworthy ones down the list are “border”, “democrats”, “nation”, “national”, “trump”, “tax”, “law”, “energy”, and “crisis”. These words suggest that Republican members discussed majorly about the current president Joe Biden, as well as current issues such as border security, energy policy, and tax law. Moreover, the word “trump” is still present in the list, indicating that the former president Donald Trump was still a major topic of discussion among Republican members.

It is surprising that gun control, climate change, and other major current issues which are present in the list of top hashtags, are not present in the list of top words. This suggests that Congress members are relying more on hashtags to convey their stance on these issues, rather than addressing them directly in their tweets.

Top Words by Party and Year

Let’s also analyze with respect to party and year.

# >>> ACTION REQUIRED: Please uncomment the following lines and remove the workaround below
# tokens_top_10_d_2018 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2018-01-01' AND time < '2019-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_r_2018 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2018-01-01' AND time < '2019-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_d_2019 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2019-01-01' AND time < '2020-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_r_2019 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2019-01-01' AND time < '2020-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_d_2020 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2020-01-01' AND time < '2021-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_r_2020 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2020-01-01' AND time < '2021-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_d_2021 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2021-01-01' AND time < '2022-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_r_2021 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2021-01-01' AND time < '2022-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_d_2022 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2022-01-01' AND time < '2023-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")
# tokens_top_10_r_2022 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2022-01-01' AND time < '2023-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 10")

# Workaround for a bug in process_text.py. I've fixed it so you won't need this anymore. You can safely remove this workaround, but you need to uncomment the code above.
tokens_top_10_d_2018 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2018-01-01' AND time < '2019-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(10)
tokens_top_10_r_2018 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2018-01-01' AND time < '2019-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(10)
tokens_top_10_d_2019 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2019-01-01' AND time < '2020-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(10)
tokens_top_10_r_2019 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2019-01-01' AND time < '2020-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(10)
tokens_top_10_d_2020 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2020-01-01' AND time < '2021-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(10)
tokens_top_10_r_2020 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2020-01-01' AND time < '2021-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(10)
tokens_top_10_d_2021 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2021-01-01' AND time < '2022-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(10)
tokens_top_10_r_2021 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2021-01-01' AND time < '2022-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(10)
tokens_top_10_d_2022 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND time >= '2022-01-01' AND time < '2023-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(10)
tokens_top_10_r_2022 <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND time >= '2022-01-01' AND time < '2023-01-01') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(10)
# End of workaround

tokens_top_10_d_2018$year <- 2018
tokens_top_10_r_2018$year <- 2018
tokens_top_10_d_2019$year <- 2019
tokens_top_10_r_2019$year <- 2019
tokens_top_10_d_2020$year <- 2020
tokens_top_10_r_2020$year <- 2020
tokens_top_10_d_2021$year <- 2021
tokens_top_10_r_2021$year <- 2021
tokens_top_10_d_2022$year <- 2022
tokens_top_10_r_2022$year <- 2022
tokens_top_10_d_2018$party <- "D"
tokens_top_10_r_2018$party <- "R"
tokens_top_10_d_2019$party <- "D"
tokens_top_10_r_2019$party <- "R"
tokens_top_10_d_2020$party <- "D"
tokens_top_10_r_2020$party <- "R"
tokens_top_10_d_2021$party <- "D"
tokens_top_10_r_2021$party <- "R"
tokens_top_10_d_2022$party <- "D"
tokens_top_10_r_2022$party <- "R"
tokens_top_10_by_party_and_year <- rbind(tokens_top_10_d_2018, tokens_top_10_r_2018, tokens_top_10_d_2019, tokens_top_10_r_2019, tokens_top_10_d_2020, tokens_top_10_r_2020, tokens_top_10_d_2021, tokens_top_10_r_2021, tokens_top_10_d_2022, tokens_top_10_r_2022)
tokens_top_10_by_party_and_year
##                word count year party
## 1             trump 50562 2018     D
## 2            people 29776 2018     D
## 3         president 25517 2018     D
## 4          congress 23849 2018     D
## 5          families 21748 2018     D
## 6            health 21598 2018     D
## 7              time 21116 2018     D
## 8              vote 19832 2018     D
## 9           country 19552 2018     D
## 10            house 19496 2018     D
## 11          support 14552 2018     R
## 12             bill 14199 2018     R
## 13              day 13786 2018     R
## 14            house 13264 2018     R
## 15           people 13036 2018     R
## 16              tax 12919 2018     R
## 17             time 12722 2018     R
## 18        president 12522 2018     R
## 19           senate 12365 2018     R
## 20         american 11575 2018     R
## 21            trump 64391 2019     D
## 22        president 46566 2019     D
## 23           people 45418 2019     D
## 24            house 38494 2019     D
## 25              act 36065 2019     D
## 26         congress 33137 2019     D
## 27             time 30891 2019     D
## 28           health 29954 2019     D
## 29          country 27052 2019     D
## 30         american 26558 2019     D
## 31        president 18596 2019     R
## 32           people 17884 2019     R
## 33        democrats 17841 2019     R
## 34            house 17254 2019     R
## 35         american 15654 2019     R
## 36             time 15470 2019     R
## 37           border 14800 2019     R
## 38  realdonaldtrump 14176 2019     R
## 39         congress 13622 2019     R
## 40          support 13298 2019     R
## 41            trump 72107 2020     D
## 42           health 55868 2020     D
## 43           people 52782 2020     D
## 44        president 43116 2020     D
## 45              act 41588 2020     D
## 46        americans 37391 2020     D
## 47             vote 36386 2020     D
## 48             care 34844 2020     D
## 49          covid19 34606 2020     D
## 50             time 33921 2020     D
## 51  realdonaldtrump 24247 2020     R
## 52           people 23287 2020     R
## 53        president 22920 2020     R
## 54         american 22684 2020     R
## 55      coronavirus 20670 2020     R
## 56          support 19399 2020     R
## 57             time 18487 2020     R
## 58          america 17697 2020     R
## 59       businesses 16965 2020     R
## 60              day 16766 2020     R
## 61              act 44490 2021     D
## 62           people 38842 2021     D
## 63         families 32646 2021     D
## 64         american 30485 2021     D
## 65           health 30050 2021     D
## 66             time 28616 2021     D
## 67             bill 26405 2021     D
## 68             care 25938 2021     D
## 69          support 25638 2021     D
## 70            house 24792 2021     D
## 71            biden 61150 2021     R
## 72           border 29875 2021     R
## 73        president 25348 2021     R
## 74        democrats 25256 2021     R
## 75         american 24835 2021     R
## 76           people 23791 2021     R
## 77        americans 22233 2021     R
## 78             bill 21846 2021     R
## 79          america 18654 2021     R
## 80            house 17748 2021     R
## 81              act 42479 2022     D
## 82           people 33985 2022     D
## 83           health 31015 2022     D
## 84          support 28402 2022     D
## 85             care 25819 2022     D
## 86         families 25526 2022     D
## 87              day 25467 2022     D
## 88         congress 24863 2022     D
## 89             time 24587 2022     D
## 90            proud 23175 2022     D
## 91            biden 68514 2022     R
## 92           border 31787 2022     R
## 93         american 28227 2022     R
## 94        inflation 23335 2022     R
## 95           energy 23160 2022     R
## 96        president 22837 2022     R
## 97        americans 22654 2022     R
## 98           people 21468 2022     R
## 99        democrats 20855 2022     R
## 100         america 19406 2022     R
ggplot(tokens_top_10_by_party_and_year, aes(x = count, y = reorder_within(word, count, interaction(year, party)), fill = party)) +
  geom_col() +
  facet_grid2(party ~ year, scales = "free", independent = "y", labeller = as_labeller(c("D" = "Democratic Party", "R" = "Republican Party", "2018" = "2018", "2019" = "2019", "2020" = "2020", "2021" = "2021", "2022" = "2022"))) +
  labs(title = "Top 10 Most Common Words Used by Congress Members in Tweets, by Party and Year, 2018-2022", x = "Number of Occurrences", y = "Word", fill = "Party") +
  scale_fill_manual(values = c(D_HEX, R_HEX)) +
  scale_x_continuous(labels = scales::comma) +
  scale_y_reordered(labels = function(x) gsub("___\\d{4}\\.(D|R)", "", x)) +
  theme_minimal() +
  theme(legend.position = "none")

“Trump” was the top word for Democratic members during the Trump administration, while “Biden” was the top word for Republican members during the Biden administration, suggesting the opposing party like to mention the ruling president in their tweets.

Democratic members’ usage of words have been consistent over the years, focusing on healthcare. On the other hand, Republican members’ usage of words have evolved, likely reacting to developments in various issues. For example, the popularity of the word “border” went from outside the top 10 in 2018 to the top 2 in 2022. There are also introduction of new words in the top 10 since the Biden administration, such as “inflation” and “energy”, suggesting that Republican members are criticizing the Biden administration’s policies on these issues.

Top Words by Chamber

Let’s also analyze with respect to chamber.

# >>> ACTION REQUIRED: Please uncomment the following lines and remove the workaround below
# tokens_top_30_h <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE chamber = 'house') AS tokens GROUP BY word ORDER BY count DESC LIMIT 30")
# tokens_top_30_s <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE chamber = 'senate') AS tokens GROUP BY word ORDER BY count DESC LIMIT 30")

# Workaround for a bug in process_text.py. I've fixed it so you won't need this anymore. You can safely remove this workaround, but you need to uncomment the code above.
tokens_top_30_h <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE chamber = 'house') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(30)
tokens_top_30_s <- dbGetQuery(con, "SELECT word, COUNT(*) FROM (SELECT unnest(tokens) AS word FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE chamber = 'senate') AS tokens GROUP BY word ORDER BY count DESC LIMIT 100") |>
  anti_join(stop_words, by = "word") |>
  head(30)
# End of workaround

tokens_top_30_h$chamber <- "House of Representatives"
tokens_top_30_h$rank <- rank(-tokens_top_30_h$count)
tokens_top_30_s$chamber <- "Senate"
tokens_top_30_s$rank <- rank(-tokens_top_30_s$count)
tokens_top_30_by_chamber <- rbind(tokens_top_30_h, tokens_top_30_s)
tokens_top_30_by_chamber
##           word  count                  chamber rank
## 1       people 263837 House of Representatives    1
## 2        trump 227818 House of Representatives    2
## 3        house 227805 House of Representatives    3
## 4          act 216053 House of Representatives    4
## 5    president 215188 House of Representatives    5
## 6     american 201051 House of Representatives    6
## 7     congress 194269 House of Representatives    7
## 8         time 190065 House of Representatives    8
## 9    americans 177940 House of Representatives    9
## 10         day 173473 House of Representatives   10
## 11      health 171928 House of Representatives   11
## 12     support 171891 House of Representatives   12
## 13        bill 170712 House of Representatives   13
## 14       biden 165887 House of Representatives   14
## 15    families 158523 House of Representatives   15
## 16     country 153382 House of Representatives   16
## 17       proud 139764 House of Representatives   17
## 18     america 138306 House of Representatives   18
## 19        care 132731 House of Representatives   19
## 20        vote 130353 House of Representatives   20
## 21   community 129029 House of Representatives   21
## 22       women 120854 House of Representatives   22
## 23        week 111774 House of Representatives   23
## 24   democrats 111676 House of Representatives   24
## 25        join 107640 House of Representatives   25
## 26     federal 107106 House of Representatives   26
## 27 communities 105625 House of Representatives   27
## 28      nation 105092 House of Representatives   28
## 29    continue 104495 House of Representatives   29
## 30     protect 103885 House of Representatives   30
## 31      senate  90259                   Senate    1
## 32      people  80623                   Senate    2
## 33      health  70592                   Senate    3
## 34         act  70558                   Senate    4
## 35   president  69149                   Senate    5
## 36       trump  67691                   Senate    6
## 37        bill  67568                   Senate    7
## 38    american  60906                   Senate    8
## 39        care  60800                   Senate    9
## 40     support  60247                   Senate   10
## 41        time  58496                   Senate   11
## 42   americans  56315                   Senate   12
## 43     country  54902                   Senate   13
## 44    families  54831                   Senate   14
## 45       biden  52534                   Senate   15
## 46         day  49759                   Senate   16
## 47        vote  41423                   Senate   17
## 48    congress  40812                   Senate   18
## 49     america  39892                   Senate   19
## 50  bipartisan  39294                   Senate   20
## 51       women  38517                   Senate   21
## 52       proud  38224                   Senate   22
## 53     protect  38147                   Senate   23
## 54     federal  37972                   Senate   24
## 55         law  37206                   Senate   25
## 56    national  37094                   Senate   26
## 57 communities  35318                   Senate   27
## 58       fight  35157                   Senate   28
## 59      nation  34835                   Senate   29
## 60    continue  34407                   Senate   30
ggplot(tokens_top_30_by_chamber, aes(x = count, y = reorder_within(word, count, chamber))) +
  geom_col(aes(fill = chamber)) +
  geom_text(aes(label = paste0("#", rank)), hjust = c(1.3, rep(-0.3, 29), 1.3, rep(-0.3, 29))) +
  facet_wrap(~chamber, scales = "free") +
  labs(title = "Top 30 Most Common Words Used by Congress Members in Tweets, by Chamber, 2017-2023", x = "Number of Occurrences", y = "Word") +
  scale_fill_manual(values = c(HOUSE_HEX, SENATE_HEX)) +
  scale_x_continuous(labels = scales::comma) +
  scale_y_reordered(labels = function(x) gsub("___.*", "", x)) +
  theme_minimal() +
  theme(legend.position = "none")

The most common words are mostly the same with the previous analysis that is by party. However, one particular word that stands out is “bipartisan” in the list of top words used by Senate members. This suggests that Senate members are more likely to discuss bipartisan issues and cooperation in their tweets compared to House members.

Sentiment Analysis by Party and Year

Let’s proceed to the analysis of tweet content. As mentioned, text mining is completed in Python beforehand. We can now retrieve the result from the database.

First, we will analyze the sentiment of tweets by party affiliation of the author.

sentiment_by_party <- dbGetQuery(con, "SELECT party, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id GROUP BY party")
sentiment_by_party
##   party       avg
## 1     D 0.2252987
## 2     R 0.2144899
## 3  <NA> 0.2358065
ggplot(sentiment_by_party, aes(x = party, y = avg)) +
  geom_col(fill = c(D_HEX, R_HEX, NA_HEX)) +
  labs(title = "Average Sentiment of Tweets of U.S. Congress Members, by Party, 2017-2023", x = "Party", y = "Average Sentiment Score") +
  scale_x_discrete(labels = c("Democratic", "Republican", "Independent")) +
  scale_y_continuous(labels = scales::comma) +
  theme_minimal()

The order of the average sentiment scores from highest to lowest is independent, Democratic, and Republican. This indicates that independent members have the most positive sentiment in their tweets, followed by Democratic members, and then Republican members. However, the difference in sentiment scores between the three parties is not significant, so we can only conclude that the sentiment of tweets of Congress members is generally positive.

Let’s see if the sentiment of tweets has changed over time.

sentiment_by_party_2018 <- dbGetQuery(con, "SELECT party, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2018-01-01' AND time < '2019-01-01' GROUP BY party")
sentiment_by_party_2019 <- dbGetQuery(con, "SELECT party, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2019-01-01' AND time < '2020-01-01' GROUP BY party")
sentiment_by_party_2020 <- dbGetQuery(con, "SELECT party, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2020-01-01' AND time < '2021-01-01' GROUP BY party")
sentiment_by_party_2021 <- dbGetQuery(con, "SELECT party, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2021-01-01' AND time < '2022-01-01' GROUP BY party")
sentiment_by_party_2022 <- dbGetQuery(con, "SELECT party, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2022-01-01' AND time < '2023-01-01' GROUP BY party")
sentiment_by_party_2018$year <- 2018
sentiment_by_party_2019$year <- 2019
sentiment_by_party_2020$year <- 2020
sentiment_by_party_2021$year <- 2021
sentiment_by_party_2022$year <- 2022
sentiment_by_party_ts <- rbind(sentiment_by_party_2018, sentiment_by_party_2019, sentiment_by_party_2020, sentiment_by_party_2021, sentiment_by_party_2022)
sentiment_by_party_ts
##    party       avg year
## 1      D 0.1944888 2018
## 2      R 0.3320895 2018
## 3   <NA> 0.2534372 2018
## 4      D 0.1939735 2019
## 5      R 0.2628996 2019
## 6   <NA> 0.1848596 2019
## 7      D 0.2036529 2020
## 8      R 0.2734945 2020
## 9   <NA> 0.1799959 2020
## 10     D 0.2799707 2021
## 11     R 0.1292012 2021
## 12  <NA> 0.3166578 2021
## 13     D 0.2901276 2022
## 14     R 0.1199303 2022
## 15  <NA> 0.3831453 2022
ggplot(sentiment_by_party_ts, aes(x = year, y = avg, fill = party)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Average Sentiment of Tweets of U.S. Congress Members, by Party and Year, 2018-2022", x = "Year", y = "Average Sentiment Score", fill = "Party") +
  scale_fill_manual(values = c(D_HEX, R_HEX, NA_HEX), labels = c("Democratic", "Republican", "Independent")) +
  scale_y_continuous(labels = scales::comma) +
  theme_minimal()

While the sentiment of tweets of Democratic members has been increasing from 2018 to 2022, the converse is true for Republican members. This suggests that Democratic members have become more positive in their tweets over time, while Republican members have become less positive. Surprisingly, the sentiment of tweets of independent members turned out to be the most positive in 2022 after reaching a low in 2020. But this could be due to the small sample size of independent members.

Sentiment Analysis by Chamber and Year

Now, we will run the same analysis but by chamber of the author.

sentiment_by_chamber <- dbGetQuery(con, "SELECT chamber, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id GROUP BY chamber")
sentiment_by_chamber
##   chamber       avg
## 1   house 0.2178552
## 2  senate 0.2306518
ggplot(sentiment_by_chamber, aes(x = chamber, y = avg)) +
  geom_col(fill = c(HOUSE_HEX, SENATE_HEX)) +
  labs(title = "Average Sentiment of Tweets of U.S. Congress Members, by Chamber, 2017-2023", x = "Chamber", y = "Average Sentiment Score") +
  scale_x_discrete(labels = c("House", "Senate")) +
  scale_y_continuous(labels = scales::comma) +
  theme_minimal()

sentiment_by_chamber_2018 <- dbGetQuery(con, "SELECT chamber, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2018-01-01' AND time < '2019-01-01' GROUP BY chamber")
sentiment_by_chamber_2019 <- dbGetQuery(con, "SELECT chamber, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2019-01-01' AND time < '2020-01-01' GROUP BY chamber")
sentiment_by_chamber_2020 <- dbGetQuery(con, "SELECT chamber, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2020-01-01' AND time < '2021-01-01' GROUP BY chamber")
sentiment_by_chamber_2021 <- dbGetQuery(con, "SELECT chamber, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2021-01-01' AND time < '2022-01-01' GROUP BY chamber")
sentiment_by_chamber_2022 <- dbGetQuery(con, "SELECT chamber, AVG(sentiment_score) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE time >= '2022-01-01' AND time < '2023-01-01' GROUP BY chamber")
sentiment_by_chamber_2018$year <- 2018
sentiment_by_chamber_2019$year <- 2019
sentiment_by_chamber_2020$year <- 2020
sentiment_by_chamber_2021$year <- 2021
sentiment_by_chamber_2022$year <- 2022
sentiment_by_chamber_ts <- rbind(sentiment_by_chamber_2018, sentiment_by_chamber_2019, sentiment_by_chamber_2020, sentiment_by_chamber_2021, sentiment_by_chamber_2022)
sentiment_by_chamber_ts
##    chamber       avg year
## 1    house 0.2515259 2018
## 2   senate 0.2530553 2018
## 3    house 0.2151629 2019
## 4   senate 0.2281195 2019
## 5    house 0.2220529 2020
## 6   senate 0.2528610 2020
## 7    house 0.2138896 2021
## 8   senate 0.2167865 2021
## 9    house 0.2123538 2022
## 10  senate 0.2219200 2022
ggplot(sentiment_by_chamber_ts, aes(x = year, y = avg, fill = chamber)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Average Sentiment of Tweets of U.S. Congress Members, by Chamber and Year, 2018-2022", x = "Year", y = "Average Sentiment Score", fill = "Chamber") +
  scale_fill_manual(values = c(HOUSE_HEX, SENATE_HEX), labels = c("House", "Senate")) +
  scale_y_continuous(labels = scales::comma) +
  theme_minimal()

Senate members have a slightly higher average sentiment score than Senate members in each year. Apart from that, the sentiment of tweets of both chambers has been relatively stable from 2018 to 2022, with a slight decrease in 2022 for both chambers, and a slight increase in 2020 for the Senate. This suggests that the sentiment of tweets of Congress members has been generally positive.

Sentiment Analysis by Topic and Party

ML topic inference is something that is not covered in this project. I will improve this part in the future. However, we can still analyze the sentiment of tweets by party and topic through tokens extracted from text mining.

sentiment_by_topic <- dbGetQuery(con, "SELECT party, topic, AVG(sentiment_score) FROM (SELECT unnest(tokens) AS topic, sentiment_score, party FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party IS NOT NULL) AS topics WHERE topic LIKE 'vaccine' OR topic LIKE 'lockdown' OR topic LIKE 'climate' OR topic LIKE 'abortion' OR topic LIKE 'gun' OR topic LIKE 'lgbtq' OR topic LIKE 'blm' OR topic LIKE 'gender' OR topic LIKE 'diversity' OR topic LIKE 'immigration' OR topic LIKE 'biden' OR topic LIKE 'trump' GROUP BY party, topic ORDER BY avg DESC")
sentiment_by_topic
##    party       topic          avg
## 1      D   diversity  0.537440683
## 2      D     vaccine  0.370930798
## 3      D       lgbtq  0.279358000
## 4      D      gender  0.256078122
## 5      R   diversity  0.249240119
## 6      D       biden  0.246496491
## 7      R       lgbtq  0.237023246
## 8      R     vaccine  0.182258747
## 9      D    abortion  0.166727510
## 10     R     climate  0.150572902
## 11     D     climate  0.109176012
## 12     R       trump  0.105979699
## 13     R    abortion  0.103581481
## 14     D immigration  0.080924406
## 15     D         blm  0.010409462
## 16     R      gender  0.005042393
## 17     D       trump -0.103745331
## 18     R immigration -0.104774530
## 19     D    lockdown -0.107225054
## 20     R    lockdown -0.129531098
## 21     R       biden -0.165162030
## 22     R         blm -0.173421729
## 23     R         gun -0.319348096
## 24     D         gun -0.480987827
ggplot(sentiment_by_topic, aes(x = topic, y = avg, fill = party)) +
  geom_col(position = "dodge") +
  labs(title = "Average Sentiment of Tweets of U.S. Congress Members, by Topic and Party, 2017-2023", x = "Topic", y = "Average Sentiment Score", fill = "Party", subtitle = "Note: Indepedent members are excluded due to small sample size") +
  scale_fill_manual(values = c(D_HEX, R_HEX, NA_HEX), labels = c("Democratic", "Republican")) +
  scale_y_continuous(labels = scales::comma) +
  theme_minimal() +
  theme(legend.position = "none")

From the plot, we can see how Democratic and Republican members have spoken about different current issues. Democratic members have a more positive sentiment when talking about topics such as “vaccine”, “abortion”, “diversity”, “immigration”, and “LGBTQ”. On the other hand, Republican members have a more positive (or less negative) sentiment when talking about topics such as “Trump”, “gun”, and “climate”. It is interesting to see that Republican members have a more positive sentiment when talking about climate, which is usually a topic that is associated with Democratic members.

This concludes the sentiment analysis.

Top Accounts Retweeted

Now that we have finished analyzing text data, let’s also look at some axillary data. We will start by analyzing the most frequently retweeted accounts by Congress members.

accounts_retweeted_top_30 <- dbGetQuery(con, "SELECT retweeting, COUNT(*) FROM tweets WHERE retweeting IS NOT NULL GROUP BY retweeting ORDER BY count DESC LIMIT 30")
accounts_retweeted_top_30$rank <- rank(-accounts_retweeted_top_30$count)
accounts_retweeted_top_30
##         retweeting count rank
## 1         HouseGOP 12645    1
## 2  realDonaldTrump  6434    2
## 3        GOPLeader  6191    3
## 4     BillPascrell  6130    4
## 5            POTUS  5438    5
## 6       WhiteHouse  5266    6
## 7          thehill  5200    7
## 8          FoxNews  4919    8
## 9   HouseDemocrats  4422    9
## 10 WaysandMeansGOP  4388   10
## 11       SenateGOP  4155   11
## 12  HispanicCaucus  3852   12
## 13   SpeakerPelosi  3801   13
## 14    SteveScalise  3745   14
## 15    JudiciaryGOP  3522   15
## 16   WaysMeansCmte  3403   16
## 17   HouseCommerce  3348   17
## 18     RepDonBeyer  3305   18
## 19        JoeBiden  3199   19
## 20   OversightDems  3082   20
## 21  HouseJudiciary  2951   21
## 22    GOPoversight  2950   22
## 23       RepCasten  2909   23
## 24   ChuckGrassley  2740   24
## 25      dcexaminer  2191   25
## 26    kylegriffin1  2168   26
## 27   RepAdamSchiff  2131   27
## 28      SenateDems  2084   28
## 29       RepTenney  2050   29
## 30             GOP  1977   30
ggplot(accounts_retweeted_top_30, aes(x = count, y = reorder(retweeting, count))) +
  geom_col() +
  geom_text(aes(label = paste0("#", rank)), hjust = -0.3) +
  labs(title = "Top 30 Most Frequently Retweeted Accounts by Congress Members, 2017-2023", x = "Number of Retweets", y = "Account") +
  scale_x_continuous(labels = scales::comma) +
  theme_minimal()

The most common accounts retweeted by Congress members are “@HouseGOP,”@realDonaldTrump“,”@GOPLeader“,”@BillPascrell“,”@POTUS“, and etc. These accounts are mostly related to the Republican Party, with the exception of”@BillPascrell“, who is a Democratic member of the House of Representatives. This suggests that Republican Congress members are more likely to amplify the messages of their party and its leaders by retweeting their accounts, compared to Democratic members.

Top Accounts Retweeted by Party

Let’s also analyze with respect to party.

accounts_retweeted_top_30_d <- dbGetQuery(con, "SELECT retweeting, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND retweeting IS NOT NULL GROUP BY retweeting ORDER BY count DESC LIMIT 30")
accounts_retweeted_top_30_r <- dbGetQuery(con, "SELECT retweeting, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND retweeting IS NOT NULL GROUP BY retweeting ORDER BY count DESC LIMIT 30")
accounts_retweeted_top_30_d$party <- "Democrat"
accounts_retweeted_top_30_d$rank <- rank(-accounts_retweeted_top_30_d$count)
accounts_retweeted_top_30_r$party <- "Republican"
accounts_retweeted_top_30_r$rank <- rank(-accounts_retweeted_top_30_r$count)
accounts_retweeted_top_30_by_party <- rbind(accounts_retweeted_top_30_d, accounts_retweeted_top_30_r)
accounts_retweeted_top_30_by_party
##         retweeting count      party rank
## 1     BillPascrell  6116   Democrat    1
## 2            POTUS  5413   Democrat    2
## 3   HouseDemocrats  4422   Democrat    3
## 4   HispanicCaucus  3851   Democrat    4
## 5    SpeakerPelosi  3797   Democrat    5
## 6    WaysMeansCmte  3399   Democrat    6
## 7      RepDonBeyer  3291   Democrat    7
## 8         JoeBiden  3193   Democrat    8
## 9          thehill  3073   Democrat    9
## 10   OversightDems  3061   Democrat   10
## 11       RepCasten  2908   Democrat   11
## 12  HouseJudiciary  2707   Democrat   12
## 13    kylegriffin1  2158   Democrat   13
## 14   RepAdamSchiff  2129   Democrat   14
## 15      WhiteHouse  2082   Democrat   15
## 16      SenateDems  2078   Democrat   16
## 17          NRDems  1965   Democrat   17
## 18     AppropsDems  1915   Democrat   18
## 19  USProgressives  1888   Democrat   19
## 20    TheDemocrats  1885   Democrat   20
## 21  TheBlackCaucus  1875   Democrat   21
## 22  VetAffairsDems  1795   Democrat   22
## 23     nowthisnews  1758   Democrat   23
## 24        RepCohen  1752   Democrat   24
## 25      TeamPelosi  1725   Democrat   25
## 26 HouseBudgetDems  1688   Democrat   26
## 27    RepEspaillat  1667   Democrat   27
## 28    KamalaHarris  1577   Democrat   28
## 29    HouseForeign  1563   Democrat   29
## 30    JulianCastro  1554   Democrat   30
## 31        HouseGOP 12631 Republican    1
## 32 realDonaldTrump  6387 Republican    2
## 33       GOPLeader  6184 Republican    3
## 34         FoxNews  4814 Republican    4
## 35 WaysandMeansGOP  4371 Republican    5
## 36       SenateGOP  4153 Republican    6
## 37    SteveScalise  3736 Republican    7
## 38    JudiciaryGOP  3521 Republican    8
## 39   HouseCommerce  3344 Republican    9
## 40      WhiteHouse  3180 Republican   10
## 41    GOPoversight  2949 Republican   11
## 42   ChuckGrassley  2718 Republican   12
## 43      dcexaminer  2126 Republican   13
## 44         thehill  2099 Republican   14
## 45       RepTenney  2049 Republican   15
## 46             GOP  1977 Republican   16
## 47   westerncaucus  1973 Republican   17
## 48      Jim_Jordan  1902 Republican   18
## 49  DonaldJTrumpJr  1795 Republican   19
## 50  RealJamesWoods  1712 Republican   20
## 51  RepMalliotakis  1660 Republican   21
## 52         newsmax  1619 Republican   22
## 53   GOPChairwoman  1589 Republican   23
## 54    senatemajldr  1569 Republican   24
## 55     SpeakerRyan  1526 Republican   25
## 56 HouseForeignGOP  1517 Republican   26
## 57   charliekirk11  1472 Republican   27
## 58     HomelandGOP  1467 Republican   28
## 59  RepAndyBiggsAZ  1400 Republican   29
## 60      EdLaborGOP  1379 Republican   30
ggplot(accounts_retweeted_top_30_by_party, aes(x = count, y = reorder_within(retweeting, count, party))) +
  geom_col(aes(fill = party)) +
  geom_text(aes(label = paste0("#", rank)), hjust = c(1.3, rep(-0.3, 29), 1.3, rep(-0.3, 29))) +
  facet_wrap(~party, scales = "free") +
  labs(title = "Top 30 Most Frequently Retweeted Accounts by Congress Members, by Party, 2017-2023", x = "Number of Retweets", y = "Account") +
  scale_fill_manual(values = c(D_HEX, R_HEX)) +
  scale_y_reordered(labels = function(x) gsub("___.*", "", x)) +
  theme_minimal() +
  theme(legend.position = "none")
## Don't know how to automatically pick scale for object of type <integer64>.
## Defaulting to continuous.

The most common accounts retweeted by Democratic members are “@BillPascrell”, “@POTUS”, “@HouseDemocrats”, “@HispanicCaucus”, “@SpeakerPelosi”, and etc. Meanwhile, the most common accounts retweeted by Republican members are “@HouseGOP”, “@realDonaldTrump”, “@GOPLeader”, “@FoxNews”, “@WaysandMeansGOP”, and etc. We can see that both parties are retweeting accounts that are organizations or individuals affiliated with their party, such as the House Democrats and House GOP. However, Republican members are more likely to retweet conservative news outlets such as Fox News and Newsmax, and accounts affiliated with former president Donald Trump, such as Donald Trump himself and his son Donald Trump Jr. This suggests that Republican members are more enthusiastic about its leaders, as well as conservative media outlets. On the other hand, Democratic members draw more attention on different caucuses and communities.

Top Accounts Quoted

Next, we will analyze the most frequently quoted accounts by Congress members.

accounts_quoted_top_30 <- dbGetQuery(con, "SELECT quoting, COUNT(*) FROM tweets WHERE quoting IS NOT NULL GROUP BY quoting ORDER BY count DESC LIMIT 30")
accounts_quoted_top_30$rank <- rank(-accounts_quoted_top_30$count)
accounts_quoted_top_30
##            quoting count rank
## 1  realDonaldTrump 14955    1
## 2          nytimes  8348    2
## 3          thehill  8211    3
## 4   washingtonpost  6921    4
## 5               AP  5164    5
## 6            POTUS  5157    6
## 7        GOPLeader  3825    7
## 8              CNN  3814    8
## 9          NBCNews  3704    9
## 10      WhiteHouse  3548   10
## 11        HouseGOP  3536   11
## 12        politico  3376   12
## 13    kylegriffin1  2973   13
## 14         FoxNews  2872   14
## 15             ABC  2763   15
## 16             NPR  2441   16
## 17    SteveScalise  2400   17
## 18       BillFOXLA  2076   18
## 19         CBSNews  2067   19
## 20         atrupar  2027   20
## 21     nowthisnews  2012   21
## 22       chiproytx  1978   22
## 23           axios  1931   23
## 24             WSJ  1853   24
## 25     CNNPolitics  1816   25
## 26     RNCResearch  1788   26
## 27        JoeBiden  1603   27
## 28           MSNBC  1585   28
## 29         Reuters  1566   29
## 30          mkraju  1555   30
ggplot(accounts_quoted_top_30, aes(x = count, y = reorder(quoting, count))) +
  geom_col() +
  geom_text(aes(label = paste0("#", rank)), hjust = -0.3) +
  labs(title = "Top 30 Most Frequently Quoted Accounts by Congress Members, 2017-2023", x = "Number of Quote Tweets", y = "Account") +
  scale_x_continuous(labels = scales::comma) +
  theme_minimal()

Top Accounts Quoted by Party

Let’s also analyze with respect to party.

accounts_quoted_top_30_d <- dbGetQuery(con, "SELECT quoting, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D' AND quoting IS NOT NULL GROUP BY quoting ORDER BY count DESC LIMIT 30")
accounts_quoted_top_30_r <- dbGetQuery(con, "SELECT quoting, COUNT(*) FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R' AND quoting IS NOT NULL GROUP BY quoting ORDER BY count DESC LIMIT 30")
accounts_quoted_top_30_d$party <- "Democrat"
accounts_quoted_top_30_d$rank <- rank(-accounts_quoted_top_30_d$count)
accounts_quoted_top_30_r$party <- "Republican"
accounts_quoted_top_30_r$rank <- rank(-accounts_quoted_top_30_r$count)
accounts_quoted_top_30_by_party <- rbind(accounts_quoted_top_30_d, accounts_quoted_top_30_r)
accounts_quoted_top_30_by_party
##            quoting count      party rank
## 1  realDonaldTrump 11159   Democrat    1
## 2          nytimes  7511   Democrat    2
## 3   washingtonpost  6130   Democrat    3
## 4          thehill  5742   Democrat    4
## 5            POTUS  3875   Democrat    5
## 6               AP  3779   Democrat    6
## 7              CNN  3283   Democrat    7
## 8          NBCNews  3209   Democrat    8
## 9     kylegriffin1  2897   Democrat    9
## 10        politico  2610   Democrat   10
## 11             NPR  2284   Democrat   11
## 12             ABC  2106   Democrat   12
## 13     nowthisnews  1958   Democrat   13
## 14         atrupar  1928   Democrat   14
## 15         CBSNews  1590   Democrat   15
## 16     RepDonBeyer  1519   Democrat   16
## 17     CNNPolitics  1518   Democrat   17
## 18      RepJayapal  1499   Democrat   18
## 19      WhiteHouse  1493   Democrat   19
## 20           MSNBC  1479   Democrat   20
## 21   RepValDemings  1295   Democrat   21
## 22  RepDwightEvans  1253   Democrat   22
## 23   SpeakerPelosi  1243   Democrat   23
## 24          mkraju  1219   Democrat   24
## 25        JoeBiden  1197   Democrat   25
## 26           axios  1177   Democrat   26
## 27   ChrisMurphyCT  1163   Democrat   27
## 28     brianschatz  1080   Democrat   28
## 29           cspan  1037   Democrat   29
## 30         latimes  1011   Democrat   30
## 31 realDonaldTrump  3598 Republican    1
## 32        HouseGOP  3416 Republican    2
## 33       GOPLeader  3290 Republican    3
## 34         FoxNews  2660 Republican    4
## 35         thehill  2360 Republican    5
## 36    SteveScalise  2291 Republican    6
## 37       BillFOXLA  2067 Republican    7
## 38      WhiteHouse  2050 Republican    8
## 39       chiproytx  1951 Republican    9
## 40     RNCResearch  1758 Republican   10
## 41     DailyCaller  1397 Republican   11
## 42              AP  1307 Republican   12
## 43           POTUS  1275 Republican   13
## 44 RepThomasMassie  1245 Republican   14
## 45      JohnCornyn  1229 Republican   15
## 46             WSJ  1214 Republican   16
## 47          nypost  1138 Republican   17
## 48      dcexaminer  1127 Republican   18
## 49   HouseCommerce  1082 Republican   19
## 50      RepChipRoy  1056 Republican   20
## 51       SenateGOP  1025 Republican   21
## 52     townhallcom   925 Republican   22
## 53            CNBC   861 Republican   23
## 54    GOPoversight   806 Republican   24
## 55    RepMattGaetz   764 Republican   25
## 56           axios   730 Republican   26
## 57         nytimes   728 Republican   27
## 58        politico   720 Republican   28
## 59    JudiciaryGOP   712 Republican   29
## 60 WaysandMeansGOP   710 Republican   30
ggplot(accounts_quoted_top_30_by_party, aes(x = count, y = reorder_within(quoting, count, party))) +
  geom_col(aes(fill = party)) +
  geom_text(aes(label = paste0("#", rank)), hjust = c(1.3, rep(-0.3, 29), 1.3, rep(-0.3, 29))) +
  facet_wrap(~party, scales = "free") +
  labs(title = "Top 30 Most Frequently Quoted Accounts by Congress Members, by Party, 2017-2023", x = "Number of Quote Tweets", y = "Account") +
  scale_fill_manual(values = c(D_HEX, R_HEX)) +
  scale_y_reordered(labels = function(x) gsub("___.*", "", x)) +
  theme_minimal() +
  theme(legend.position = "none")
## Don't know how to automatically pick scale for object of type <integer64>.
## Defaulting to continuous.

Top Accounts Mentioned

Lastly, we will analyze the most common accounts mentioned by Congress members in tweets.

accounts_mentioned_top_30 <- dbGetQuery(con, "SELECT mention, COUNT(*) FROM (SELECT UNNEST(mentions) AS mention FROM tweets) AS mentions GROUP BY mention ORDER BY count DESC LIMIT 30")
accounts_mentioned_top_30$rank <- rank(-accounts_mentioned_top_30$count)
accounts_mentioned_top_30
##            mention count rank
## 1  realDonaldTrump 81926    1
## 2            POTUS 81294    2
## 3   HouseDemocrats 23631    3
## 4         JoeBiden 18827    4
## 5         HouseGOP 18133    5
## 6    SpeakerPelosi 14533    6
## 7          FoxNews 11870    7
## 8       RepJayapal 10613    8
## 9     senatemajldr  9109    9
## 10   ChuckGrassley  8671   10
## 11       chiproytx  8164   11
## 12              VP  7993   12
## 13       GOPLeader  7772   13
## 14  HouseJudiciary  7436   14
## 15             AOC  7254   15
## 16      JohnCornyn  7023   16
## 17      SenSchumer  7005   17
## 18      WhiteHouse  6993   18
## 19             EPA  6838   19
## 20       SenateGOP  6780   20
## 21             GOP  6637   21
## 22            USDA  6334   22
## 23     timburchett  6182   23
## 24    KamalaHarris  6143   24
## 25          CDCgov  6009   25
## 26           MSNBC  5936   26
## 27     FoxBusiness  5756   27
## 28          SBAgov  5633   28
## 29    SteveScalise  5555   29
## 30 JoaquinCastrotx  5468   30
ggplot(accounts_mentioned_top_30, aes(x = count, y = reorder(mention, count))) +
  geom_col() +
  geom_text(aes(label = paste0("#", rank)), hjust = -0.3) +
  labs(title = "Top 30 Most Frequently Mentioned Accounts by Congress Members in Tweets, 2017-2023", x = "Number of Mentions", y = "Account") +
  scale_x_continuous(labels = scales::comma) +
  theme_minimal()

Top Accounts Mentioned by Party

Let’s also analyze with respect to party.

accounts_mentioned_top_30_d <- dbGetQuery(con, "SELECT mention, COUNT(*) FROM (SELECT UNNEST(mentions) AS mention FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'D') AS mentions GROUP BY mention ORDER BY count DESC LIMIT 30")
accounts_mentioned_top_30_r <- dbGetQuery(con, "SELECT mention, COUNT(*) FROM (SELECT UNNEST(mentions) AS mention FROM tweets JOIN accounts ON tweets.account_id = accounts.id JOIN members ON accounts.member_id = members.id WHERE party = 'R') AS mentions GROUP BY mention ORDER BY count DESC LIMIT 30")
accounts_mentioned_top_30_d$party <- "Democrat"
accounts_mentioned_top_30_d$rank <- rank(-accounts_mentioned_top_30_d$count)
accounts_mentioned_top_30_r$party <- "Republican"
accounts_mentioned_top_30_r$rank <- rank(-accounts_mentioned_top_30_r$count)
accounts_mentioned_top_30_by_party <- rbind(accounts_mentioned_top_30_d, accounts_mentioned_top_30_r)
accounts_mentioned_top_30_by_party
##            mention count      party rank
## 1            POTUS 47214   Democrat  1.0
## 2  realDonaldTrump 36678   Democrat  2.0
## 3   HouseDemocrats 21018   Democrat  3.0
## 4         JoeBiden 12067   Democrat  4.0
## 5       RepJayapal 10436   Democrat  5.0
## 6         HouseGOP  7409   Democrat  6.0
## 7    SpeakerPelosi  6799   Democrat  7.0
## 8   HouseJudiciary  6675   Democrat  8.0
## 9     senatemajldr  6648   Democrat  9.0
## 10             AOC  5905   Democrat 10.0
## 11    KamalaHarris  5495   Democrat 11.0
## 12     RepPressley  5358   Democrat 12.0
## 13 JoaquinCastrotx  5330   Democrat 13.0
## 14  AyannaPressley  5326   Democrat 14.0
## 15           MSNBC  5311   Democrat 15.0
## 16             EPA  5241   Democrat 16.0
## 17      SenSchumer  5216   Democrat 17.0
## 18       RepRaskin  5152   Democrat 18.0
## 19             GOP  5110   Democrat 19.0
## 20       SenateGOP  4943   Democrat 20.0
## 21      CoryBooker  4925   Democrat 21.0
## 22     AppropsDems  4790   Democrat 22.0
## 23   ChrisMurphyCT  4600   Democrat 23.0
## 24            USPS  4520   Democrat 24.0
## 25    RepEspaillat  4466   Democrat 25.0
## 26      RepMaloney  4436   Democrat 26.0
## 27  HispanicCaucus  4188   Democrat 27.0
## 28       SenWarren  4172   Democrat 28.0
## 29         IlhanMN  4163   Democrat 29.0
## 30    JulianCastro  4161   Democrat 30.0
## 31 realDonaldTrump 45031 Republican  1.0
## 32           POTUS 33896 Republican  2.0
## 33         FoxNews 10754 Republican  3.0
## 34        HouseGOP 10709 Republican  4.0
## 35       chiproytx  8061 Republican  5.0
## 36   ChuckGrassley  7806 Republican  6.0
## 37   SpeakerPelosi  7689 Republican  7.0
## 38        JoeBiden  6697 Republican  8.0
## 39     timburchett  6166 Republican  9.0
## 40      JohnCornyn  6148 Republican 10.0
## 41       GOPLeader  5726 Republican 11.0
## 42     FoxBusiness  5588 Republican 12.0
## 43    SteveScalise  5234 Republican 13.0
## 44              VP  4359 Republican 14.0
## 45    RepMattGaetz  4337 Republican 15.0
## 46      Jim_Jordan  4162 Republican 16.0
## 47     RepDonBacon  4124 Republican 17.0
## 48         tedcruz  3898 Republican 18.0
## 49      SenTedCruz  3689 Republican 19.0
## 50      WhiteHouse  3599 Republican 20.0
## 51            USDA  3318 Republican 21.0
## 52       RepMcCaul  3236 Republican 22.0
## 53   HouseCommerce  3218 Republican 23.0
## 54  RepAndyBiggsAZ  3208 Republican 24.0
## 55      RepChipRoy  3016 Republican 25.0
## 56    SenRickScott  2770 Republican 26.5
## 57             WSJ  2770 Republican 26.5
## 58 realdonaldtrump  2743 Republican 28.0
## 59        RandPaul  2732 Republican 29.0
## 60             CBP  2731 Republican 30.0
ggplot(accounts_mentioned_top_30_by_party, aes(x = count, y = reorder_within(mention, count, party))) +
  geom_col(aes(fill = party)) +
  geom_text(aes(label = paste0("#", rank)), hjust = -0.3) +
  facet_wrap(~party, scales = "free_y") +
  labs(title = "Top 30 Most Frequently Mentioned Accounts by Congress Members in Tweets, by Party, 2017-2023", x = "Number of Mentions", y = "Account") +
  scale_fill_manual(values = c(D_HEX, R_HEX)) +
  scale_x_continuous(labels = scales::comma, limits = c(NA, max(as.integer(accounts_mentioned_top_30_by_party$count)) * 1.05)) +
  scale_y_reordered(labels = function(x) gsub("___.*", "", x)) +
  theme_minimal() +
  theme(legend.position = "none")

Conclusion

In this analysis, we have explored the tweets of U.S. Congress members from 2017 to 2023. We have analyzed the sentiment of tweets by different metrics, and identified the most common words, hashtags, and accounts mentioned in tweets. Here are some key takeaways from our analysis:

Overall, the analysis has provided valuable insights into the communication strategies and priorities of U.S. Congress members on Twitter. I hope that this analysis will help shed light on the dynamics of political discourse on social media and inform future research on the topic.

If you have any questions or feedback, or want to build on this analysis, please feel free to create pull requests or issues on the GitHub repository. Thank you for reading!

MAJOR THANKS AND CREDITS TO THE CONGRESSTWEETS DATASET BY ALEXLITEL: https://github.com/alexlitel/congresstweets